Assembling a parallel corpus from RSS news feeds
نویسنده
چکیده
We describe our use of RSS news feeds to quickly assemble a parallel English-Japanese corpus. Our method is simpler than other web mining approaches, and it produces a parallel corpus whose quality, quantity, and rate of growth are stable and predictable.
منابع مشابه
Automated System for Improving RSS Feeds Data Quality
Nowadays, the majority of RSS feeds provide incomplete information about their news items. The lack of information leads to engagement loss in users. We present a new automated system for improving the RSS feeds’ data quality. RSS feeds provide a list of the latest news items ordered by date. Therefore, it makes it easy for a web crawler to precisely locate the item and extract its raw content....
متن کاملSynthesizing correlated RSS news articles based on a fuzzy equivalence relation
Tens of thousands of news articles are posted on-line each day, covering topics from politics to science to current events. To better cope with this overwhelming volume of information, RSS (news) feeds are used to categorize newly posted articles. Nonetheless, most RSS users must filter through many articles within the same or different RSS feeds to locate articles pertaining to their particula...
متن کاملGenerating Fuzzy Equivalence Classes on RSS News Articles for Retrieving Correlated Information
Tens of thousands of news articles are posted on-line each day, covering topics from politics to science to current events. In order to better cope with this overwhelming volume of information, RSS (news) feeds are used to categorize newly posted articles. Nonetheless, most RSS users must filter through many articles within the same or different RSS feeds in order to locate articles pertaining ...
متن کاملMatt Fuller
Traditionally users subscribe to RSS feeds of interest using an RSS feed reader. The RSS feed reader periodically polls the subscribed feeds for updates or items to be displayed to the user. Many RSS feeds usually pertain to a single news source or blog. Others may aggregate various feeds usually on some topic and produce a single RSS feed. Middleware publishsubscribe systems allow users to sub...
متن کامل(X)querying RSS/Atom Feeds Extracted from News Web Sites: a Cocoon-based Portal
The Web is fastly becoming the predominant source for news and information for many people. In the past few years, a new delivery system has emerged in the form of RSS feeds. Such feeds normally provide a brief of a larger news posted on the Web. RSS feeds, collected to form “channels” according to some thematic criteria, can be accessed using Web browsers or specialized software called “news a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005